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Abstract. We study how iterated convolutions of probability measures compare under stochastic 
domination. We give necessary and sufficient conditions for the existence of an integer n such that fx* n 
is stochastically dominated by v* n for two given probability measures fi and v. As a consequence we 
obtain a similar theorem on the majorization order for vectors in R d . In particular we prove results 
about catalysis in quantum information theory. 

Domination stochastique pour les convolutions iterees et catalyse quantique 

Resume. Nous etudions comment les convolutions iterees des mesures de probabilites se compar- 
ent pour la domination stochastique. Nous donnons des conditions necessaires et suffisantes pour 
l'existence d'un entier n tel que /J* n soit stochastiquement dominee par v* n , etant donnees deux 
mesures de probabilites fi et v. Nous obtenons en corollaire un theoreme similaire pour des vecteurs 
de R d et la relation de Schur-domination. Plus specifiquement, nous demontrons des resultats sur la 
catalyse en theorie quantique de l'information. 



Introduction and notations 

This work is a continuation of [Tj, where we study the phenomenon of catalytic majorization in 
quantum information theory. A probabilistic approach to this question involves stochastic domination 
which we introduce in Section Q] and its behavior with respect to the convolution of measures. We 
give in Section [2] a condition on measures /i and v for the existence of an integer n such that \x* n is 
stochastically dominated by v* n . We gather further topological and geometrical aspects in Section 
[3J Finally, we apply these results to our original problem of catalytic majorization. In Section |4] 
we introduce the background for quantum catalytic majorization and we state our results. Section [5] 
contains the proofs and in Section [6] we consider an infinite dimensional version of catalysis. 

We introduce now some notation and recall basic facts about probability measures. We write P(R) 
for the set of probability measures on R. We denote by 5 X the Dirac mass at point x. If \i G P(R), we 
write supp/z for the support of fi. We write respectively min^ e [— oo, +oo) and max/x g (— oo, +oo] 
for minsupp/i and maxsupp /i. We also write fx(a, b) and fi[a, b] as a shortcut for n((a, b)) and fi([a, b)). 
The convolution of two measures /j, and v is denoted fx * v. Recall that if X and Y are independent 
random variables of respective laws /U and v, the law of X + Y is given by /x * v. The results of this 
paper are stated for convolutions of measures, they admit immediate translations in the language of 
sums of independent random variables. For A £ R, the function e\ is defined by e\(x) — exp(Ax). 

1. Stochastic domination 
A natural way of comparing two probability measures is given by the following relation 
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Definition 1.1. Let [i and v be two probability measures on the real line. We say that /i is stochasti- 
cally dominated by v and we write /j ^ s t v if 

(1) Vi e R,n[t, oo) v[t, oo). 

Stochastic domination is an order relation on P(R) (in particular, /x ^ st v and ^ ^ st // imply [i = v). 
The following result [III [9] provides useful characterizations of stochastic domination. 

Theorem. Let (i and v be probability measures on the real line. The following are equivalent 

(1) (i < 8 t v. 

(2) Sample path characterization. There exists a probability space (O, J 7 , P) and iwo random 
variables X and Y on fl with respective laws fi and v, so that 

Vcj e n,j«f(w) < Y{uj). 

(3) Functional characterization. For any increasing function f : R — > R so iftai ftoi/i integrals 
exist, 

J fdn^ J fdv. 

It is easily checked that stochastic domination is well-behaved with respect to convolution. 

Lemma 1.2. Let fix, fi2, v\, be probability measures on the real line. If [i\ ^ s t v \ and Hi ^ st V2, 
then Hi * \i2 < B t V\ * 

Lemma 1.3. Let /x and v be two probability measures on the real line such that /i ^ st v. Then, for all 

For fixed \i and v, it follows from Lemma fOl that the set of integers k so that n* k < st v* k is stable 
under addition. In general [i* n < st v* n does not imply ^*( n+1 ) ^ at !/*("+!). Here is a typical example. 

Example 1.4. Lei /i and be the probability measures defined as 

\i = 0A6 + 0.6<5 2 

v = 0.8*i + 0.2(5 3 
7i is straightforward to verify (see FigureUty that 

• For fc = 2, and therefore for all even k, we have /i* fc ^ st v* k . 

• For odd, we have fi* k ^ st j^** on/y for k ^ 9. 




Figure 1. Cumulative distribution functions of of fi* k (solid line) and v* k (dotted 
line) from Example 1 1.41 for k — 1,2,3,9. 
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Other examples show that the minimal n so that fi* n < st v* n can be arbitrary large. This is the 
content of the next proposition. 

Proposition 1.5. For every integer n, there exist compactly supported probability measures fx and v 
such that fi* n s^ st v* n and, for all 1 < k < n — 1, fi* k ^ st v* k . 

Proof. Let /i = £<5_2n + (1 — and v be the uniform measure on [0,2], where < e < 1 will be 
defined later. For k ^ 1, 

^ k = f k ) (i - e) i e fe -M i _ 2n(fc _ i) , 
i=0 ^ ' 

Note that supp(^* fe ) C R + , while for 1 < k ^ n, the only part of Li* k charging R+ is the Dirac mass 
at point k. This implies that 

f-* k ^st v * k ^ fi* k [k, +oo) ^ v* k [k,+oo). 

We have fJ,* k [k, +oo) = (1 -e) fe and u* k [k, +oo) = 1/2. It remains to choose e so that (1 -e) n < 1/2 < 
(l-e)"- 1 . " □ 

2. Stochastic domination for iterated convolutions and Cramer's theorem 

In light of previous examples, we are going to study the following extension of stochastic domination 

Definition 2.1. We define a relation on P(R) as follows 

ix <* v 3n ^ 1 s.t. /x* n <rt v* n - 

In turns that when defined on P(R), this relation is not an order relation due to pathological 
poorly integrable measures. Indeed, there exist two probability measures \x and v so that \i ^ v and 
/x * /x = v * v (see [7], p. 479). Therefore, the relation is not anti-symmetric. For this reason, 
we restrict ourselves to sufficiently integrable measures (however, most of what follows generalizes to 
wider classes of measures). This is quite usual when studying orderings of probability measures, see 
[17] for examples of such situations. 

Definition 2.2. A measure fx on R is said to be exponentially integrable if J e\d\x < +oo for all A G R 
(recall that e\(x) = exp(Xx)). We write P oxp (R) for the set of exponentially integrable probability 
measures. 

Notice that the space of exponentially integrable measures is stable under convolution. 

Proposition 2.3. When restricted to P cxp (R), the relation is a partial order. 

Proof. One has to check only the antisymmetry property, the other two being obvious. Let k and I 
be two integers such that Li* k < st v* k and v* 1 < st fj,* 1 . Then fi* kl < Bt v* kl < st fi* kl and therefore 
_ y *ki _ g u ^. •£ ^ an£ j v are ex p 0nen ti a lly integrable, this implies that fx = v. One can see this 
in the following way: if we denote the moments of \i by m p (fj) = J x p dfi(x), one checks by induction 
on p that m p ((x) = m p (v) for all p S N. On the other hand, exponential integrability implies that 
rri2p(n) 1 ^ 2p ^ Cp for some constant C, so that Carleman's condition is satisfied (see [7], p. 224). 
Therefore [i is determined by its moments and /i = v. □ 

We would like to give a description of the relation ^* t , for example similar to the functional char- 
acterization of ^ st . We start with the following lemma 

Lemma 2.4. Let fx, v £ P GX p(R) such that /i v. Then the following inequalities hold: 
(a) VA > 0,/ e\d[i ^ J e\du, 
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(b) VA < 0,/ e\d^i ^ J e\dv, 

(c) J xd(j,(x) ^ J xdv(x), 

(d) min fi ^ min v , 

(e) max fi ^ max v , 

Proof. Let fi <* t v and A > 0. Since [i* n < ^*™ for some n, we get from the functional characterization 
of ^ st that 



e\d/i* n ^ / e x dv 



e^dfi* n = / e\d^ 



It remains to notice that 



and we get (a). The proof of (b) is completely symmetric, while (c) follows also from the functional 
characterization. Conditions (d) and (e) are obvious since min(^* n ) = nmin(/i) and max(/x*™) = 
nmax(/i). □ 



The following Proposition shows that the necessary conditions of Lemma I2~4l are "almost sufficient". 

Proposition 2.5. Let (i,i/£ P cxp (R). Assume that the following inequalities hold 

(a) VA > 0,/ e\d\i < J e\du. 

(b) VA<0,/eAfi^< J e\d[i. 

(c) J xd/j,(x) < J xdv(x). 

(d) max/i < maxv. 

(e) min fi < min v. 

Then [i v , and more precisely there exists an integer N £ N such that for any n ^ N , fi* n ^ st v* n , 

We give in Proposition 13.61 a counter-example showing that Proposition 12.51 is not true when stated 
with large inequalities. 

We are going to use Cramer's theorem on large deviations. The cumulant generating function A^, 
of the probability measure fi is defined for any A € R by 

A M (A) = log J e\d[i. 

It is a convex function taking values in R. Its convex conjugate A*, sometimes called the Cramer 
transform, is defined as 

A* (t) = sup Xt - A M (A). 

Note that A* : R — > [0, +oo] is a smooth convex function, which takes the value +00 on R \ 
[min /t, max fi] . Moreover, for t G (min fi, max/i), the supremum in the definition of A*(i) is at- 
tained at a unique point A*. Moreover, At > if t > J xd[i(x) and At < if t < J xd[i(x). Also, 
A*(J xd/u,(x)) = since A^(0) = J xdfi(x). We now state Cramer's theorem. The theorem can be 
equivalently stated in the language of sums of i.i.d. random variables 0[9]. 

Theorem (Cramer's theorem). Let \i G P cxp (R). Then for any teR, 

< _ . .. 1. *„,, \ JO if t s£ f xdfi(x) 

(2) hm - log /z* n tn, +00 = I \ 
n^oc n \-A x (t) otherwise. 

i- 1, „ „ |0 if t ^ f xdn(x) 

(3) hm -log(l- M « l (tn,+ TO )) = I \ 
n-»oc n I Ax(t) otherwise. 
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Proof of Proposition \2.5\ Note that the hypotheses imply that the quantities max /t and min v are 
finite. We write also M p = J xdfi(x) and M v = J xdvix). For n ^ 1, define (/„) and (g n ) by 

/„(t) = A** n [*n,+oo), 

9n(t) = v* n [tn,+co). 

We need to prove that /„ ^ g n on R for n large enough. If t > max/j, the inequality is trivial since 
f n {t) = 0. Similarly, if t < mint/ we have g n (t) = 1 and there is nothing to prove. 

Fix a real number to such that < to < M v . We first work on the interval / = [to, max fx]. By 
Cramer's theorem, the sequences {fV n ) and (flv/™) converge respectively on I toward / and g defined 

by 

f(t) = expHW), 

' 1 if t < i < ^ 

exp(— A*(t)) if M y < t < maxfi. 

Note that / and </ are continuous on /. We claim also that f < g on I. The inequality is clear on 
[to, M„] since / < 1. If t € [M v , max//], note that the supremum in the definition of A*(t) is attained 
for some A > — to show this we used hypothesis (d). Using (a) and the definition of the convex 
conjugate, it implies that A*(t) > A*(t). We now use the following elementary fact: if a sequence 
of non-increasing functions defined on a compact interval I converges pointwise toward a continuous 
limit, then the convergence is actually uniform on / (for a proof see [16] Part 2, Problem 127; this 
statement is attributed to Polya or to Dini depending on authors). We apply this result to both (/« ) 
and (gl/ n ) ; and since f < g, uniform convergence implies that for n large enough, f}J n < g„ on /, 
and thus /„ < g n . 

Finally, we apply a similar argument on the interval J = [min v, to] , except that we consider the 
sequences (1 — f n ) 1 ' n and (1 — Sn) 1 ^™, and we use ([3]) to compute the limit. We omit the details since 
the argument is totally symmetric. 

We eventually showed that for n large enough, f n ^ g n on I U J, and thus on R. This is exactly 
the conclusion of the proposition. □ 

3. Geometry and topology of <* t 

We investigate here the topology of the relation ^* t . We first need to define a adequate topology 
on P cxp (R). This space can be topologized in several ways, an important point for us being that the 
map n h- > J e\d\i should be continuous. 

Definition 3.1. A function f : R — * R is said to be subexponential if there exist constants c,C so 
that for every x £ R 

|/(x)| < Cexp(c\x\). 

Definition 3.2. Let r be the topology defined on the space of exponentially integrable measures, gen- 
erated by the family of seminorms (Nf) 



N f {fJL) = 



fdn 



where f belongs to the class of continuous subexponential functions. 

The topology t is a locally convex vector space topology. It can be shown that the relation <* t is 
not r-closed (see Proposition I3-6|) . However, we can give a functional characterization of its closure. 
This is the content of the following theorem. 
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Theorem 3.3. Let R C P exp (R) 2 be the set of couples (fJ,,v) of exponentially integrable probability 
measures so that p, v. Then 

(4) R = | {n, v) G P cxp (R) 2 s.t. VA > 0, J e x dp < ^ e A d^ and VA 0, J e x dp ^ ^ e A d^| , 
the closure being taken with respect to the topology r. 

Proof. Let us write X for the set on the right-hand side of Q. We get from Lemma [2741 that R C X, 
Moreover, it is easily checked that X is r-closed, therefore R C X, Conversely, we are going to 
show that the set of couples (/i, v) satisfying the hypotheses of Proposition 12.51 is r-dense in X. Let 
(p, v) G X. We get from the inequalities satisfied by fx and v that 

• J xdp(x) < xdu(x) (taking derivatives at A = 0), 

• min fi ^ minz/ (taking A — > — oo), 

• max/x sC maxi/ (taking A — » +oo). 

We want to define two sequences (//„, i/ n ) which r-converge toward (/i, u), with /i n ^ st /i and v ^ st ^ n 
and for which the above inequalities become strict. Assume for example that max// = max;/ = +oo 
and min/i = min^ = — oo. Then we can define p n and v n as follows: let e n = p[n, +oo) and 
?y„ = i^(— oo, — n], and set 

/^n = P\( — oo,n) ^n^ni 
*Ai = ^|(-n,+oo) + TjnS-n- 

We check using dominated convergence than lim/x n = /x and hmf n = ^ with respect to r, while by 
Proposition 12.51 we have p n v n . The other cases are treated in a similar way: we can always play 
with small Dirac masses to make all inequalities strict (for example, if max/i = maxi/ = M < +oo, 
replace v by (1 — e)v + s6m+i, and so on). □ 

A more comfortable way of describing the relation is given by the following sets 

Definition 3.4. Let v G P cxp (R). We define D(y) to be the following set 

D{v) = {/i G A cxp 

(R) s.t. n ^* st v}. 

Using the ideas in the proof of Theorem I3.3[ it can easily be showed that for v G P CX p(R) such that 
mini/ > — oo, one has 

(5) D(v) = |/i G P cxp (R) s.t. VA 0, J e\dfi ^ J e\dv and VA ^ 0, J e\dp ^ J e\dv 

where the closure is taken in the topology r. However, for measures v with min^ = — oo, the condition 
(e) of Proposition l2~5l is violated and we do not know if the relation ([5]) holds. 

Another consequence of equation © is that the r-closure of D{v) is a convex set. It is not clear 
that the set D(v) itself is convex. We shall see in Proposition 13.71 that this is not the case in general 
for measures v £ P oxp (R). Not also that for fixed v G P(R) the set {p G P(R) s.t. p ^ st v\ is easily 
checked to be convex. 

Remark 3.5. One can analogously define for p G P cxp (R) the "dual" set 

E[p) ={f£ P CXP (R) s.t. p v}. 

Results about D(y) or E(fi) are equivalent. Indeed, let pf* be the measure defined for a Borel set B by 
p^(B) = p(—B). We have p v p^ and therefore E(p) = D(p^)^ . 

We now give an example showing that the relation is not r-closed. 
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Proposition 3.6. There exists a probability measure v £ P cxp (R) so that the set D(y) is not r- closed . 
Consequently, the set R appearing in (j4]) is not closed either. 

Proof. Let us start with a simplified sketch of the proof. By the examples of Section for each positive 
integer k, one can find probability measures fik and Vk such that fik S D(vk), while fj,^ ^ st . We 
sum properly rescaled and normalized versions of these measures in order to obtain two probability 
measures [i and v such that [i ^ D(v). However, successive approximations fl n of [i are shown to 
satisfy fi n ^ st v which implies [i S D(v) and thus D(v) ^ D(v). 

We now work out the details. For k ^ 1, let = (fc + 2)!, b k = (k + 2)! + 1 and -f k = cexp(— fc fe ), 
where the constant c is chosen so that J2lk = 1- We check that (a^) and (bk) satisfy the following 
inequalities 

(6) (k - l)b k + 6 fe _i < fca fe , 

(7) kb k < a k+1 . 



It follows from Proposition 11.51 that for each k S N there exist /i& and i-Tc , probability measures 
with compact support such that [ik G D{vk) while [i* k k ^ st ^ . Moreover, we can assume that 
supp(/i/j) C {a k ,bk) and supp(Vfc) C (ak,bk). Indeed, we can apply to both measures a suitable affine 
transformation (increasing affine transformations preserve stochastic domination and are compatible 
with convolution). We now define /i and v as 

oo oo 

fi = 2J Ik^k and v = ^ IkVk- 

k=l k=l 

Note that the sequence (7^) has been chosen to tend very quickly to to ensure that fi and v are 
exponentially integrable. We also introduce the following sequences of measures 

n / 00 \ 

Mn = ^2~/kVk + I X] 7fc ) S °> 
k-1 \k=n+l ) 



^2 IkVk -+ 

fc=l \fc=n+l 



X! ^ So- 



One checks using Lebesgue's dominated convergence theorem that the sequences {fin) and {v n ) converge 
respectively toward /i and v for the topology r. Note also that this sequences are increasing with respect 
to stochastic domination, so that i> n ^ st v. For fixed k, [i k and Vk satisfy the hypotheses of Proposition 



and thus the same holds for fi n and v n - Therefore fi n S D(v n ) C D(u). This proves that fi G D(v). 
We now prove by contradiction that fi ^ D(v). Assume that [i £ -D(z/), i.e. /i** ^ st v* k for some 
fc ^ 1. Let Sk = kcik and = kbk- Fix a sequence i\, . . . ,ik of nonzero integers. Set m = /%*•• •* /ii fc 
or to = * ••• * fj fc . We know that supp(to) C (a, 6), with a = Sj=i a ij and b = Y^j=i It is 
possible to locate precisely supp(w) using the inequalities ((6]) and (jTj) . 

(a) If ij > k for some j, then a ^ ak+i > tk and therefore supp(to) C (tk, +00). 

(b) If ij = k for all j, then a — Sk and 6 = tk and therefore supp(m) C (sk, t k ). 

(c) If ij ^ fc for all j and ij,, < fc for some jo, then b ^ b k -i + (k — l)bk < Sk and therefore 
supp(to) C [0, s k ). 

Consequently, 



[t kl +co) = 7 it ...7i fc Mii * •• • *Mi fe [*fc,+°o) = 2^ 7ii---7» fc ="* [*fc,+°°). 

satisfying (a) 
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Moreover, because of (b) and (c), we get that for Sk ^ t < tk, 

»* k [t,t k ) = fa% k [t,t k ) = ^1%+oc). 

and similarly 

v* k [t,t k )= 1 k v* k k [t,+oo). 

We assumed that fi* k < 8t v* k , i.e. (i* k [t, +00) < v* k [t, +00) for all i. If t < tfe, since n* k (tk, +00) = 
v* k {tk 7 +00), we get that ^* fc [i,t fc ) ^ t^* fe Since 7*; > 0, this implies that for all t ^ Sfe, 

+00) < ^ fc [t, +00). This contradicts the fact that /x^ fc ^ st v* k . Therefore fx G D(v) \ D(v), and 
so D(i/) is not closed. □ 

We now give an example of what can happen if we consider measures with poor integrability 
properties. 

Proposition 3.7. There exists a probability measure v G P(R) such that the set 
(8) {fi e P(R) s.t. fi u] 

is not convex. 

The difference between equation ([8]) and our definition of D(v) is that here we do not suppose the 
measures to be exponentially integrable. 

Proof. We rely on the following fact which we already alluded to (see [7|, P- 479): there exist two 
distinct real characteristic functions 0i and 02 such that 2 = 0| identically. Consider now the 
measures fi and v with respective characteristic functions 0i and 02, i-e. 4>i(t) = J e i4 cfyi(i) and 
<t>2{t) — J e lt dv(t). Obviously, we have v v and /1 ^5* t v since \X* 2 — v* 2 . Let \ = \\i + \v and let 
us show that x y^lt v - We have 



1 



e(T)"- 3 " + s' 2 " 

i even ^ £ odd 



Thus x* 2n ^ s t v* 2n , is equivalent to i/* 2 ™^ 1 *^ ^ st j/* 2 ™. Let us show that this is impossible. Indeed, the 
measures v* 2n ~ x *n and v* 2n have real characteristic functions and thus they are symmetric probability 
measures. Note however that two symmetric probability distributions cannot be compared with < st 
unless they are equal. But it cannot be that v* 2n ~ x * p, = v* 2n because their characteristic functions 
are different (0i(£) = 02(C) iff- 0i(£) = 0). A similar argument holds for x * 2n + 1 <£ gt v * 2n+1 . □ 

We conclude this section with few remarks on a relation which is very similar to ^* t . It is the 
analogue of catalytic majorization in quantum information theory (see Section HJ. 

Definition 3.8. Let fj,,v G P CX p(R)- We say that ji is catalytically stochastically dominated by v and 
write fj, ^ t v if there exists a probability measure ir G P cxp (R) such that /1 * ir ^ st v * ir. 

The following lemma shows a connection between the two relations. 

Lemma 3.9. Let /i, v G P cxp (R). Assume n <* t v. Then fi v - 
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Proof. Assume that fi* n ^ st v* n for some n. Let ir the probability measure defined by 



n-l 

n 



_^ n— i 



fc=0 



Let also p be the measure defined by 

n ^ — ' 



fc=l 

then one has fi * tt — -M*™ + p and v * tt = ^v* n + p, and since fi* n ^ s t v* n this implies fi * tt ^ st v * tt. 
Since tt G P OX p(R-), we get /i v. □ 

From Theorem 13.31 and Lemma I3U1 one can easily derive the 

Corollary 3.10. The analogue of Theorem \3.3\ is true if we substitute with 

4. Catalytic majorization 

This section is dedicated to the study of the majorization relation, the notion which was the initial 
motivation of this work. The majorization relation provides, much as the stochastic domination for 
probability measures, a partial order on the set of probability vectors. Originally introduced in linear 
algebra [12} 13"]. it has found many application in quantum information theory with the work of Nielsen 
|13| 114). We shall not focus on quantum-theoretical aspects of majorization; we refer the interested 
reader to [l] and references therein. Here, we study majorization by adapting previously obtained 
results for stochastic domination. 

The majorization relation is defined for probability vectors, i.e. vectors x 6 R N with non-negative 
components (xi ^ 0) which sum up to one (J2i x i = !)• Before defining precisely majorization, let 
us introduce some notation. For d S N*, let Pd be the set of c?-dimensional probability vectors : 
Pd = {x G R d s.t. Xi ^ 0,J2 x i — !}■ Consider also the set of finitely supported probability vectors 
P<oc = {Jd>o^ d - We- equip P<oo with the t\ norm defined by ||a;||i = J2i \ x i\- For a vector x G P<oc, 
we write a; max for the largest component of x and ir m i n for its smallest non-zero component. In this 
section we shall consider only finitely supported vectors. For the general case, see Section [6l We 
shall identify an element x € Pd with the corresponding element in Pd' (d' > d) or P<oo obtained by 
appending null components at the end of x. 

Next, we define x^-, the decreasing rearrangement of a vector x G Pd as the vector which has the 
same coordinates as x up to permutation and such that x\ ^ x\ +1 for all 1 ^ i < d. We can now define 
majorization in terms of the ordered vectors: 

Definition 4.1. For x,y G Pd we say that x is majorized by y and we write x -< y if for all k G 
{l,...,d} 

k k 



n-l 

(n-fc) 

/ / 1 

n 



(9) ' Y.-' 



Note however that there are several equivalent definitions of majorization which do not use the 
ordering of the vectors x and y (see [3] for further details) : 

Proposition 4.2. The following assertions are equivalent: 

(1) x < y, 

(2) ViGR,£tik-*KEtil2A-4 
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(3) Vt G R, Eti 0* " + < Eti (W " *) + , wAere z+ = max(z, 0), 

(4) There is a bistochastic matrix B such that x = By. 

There are two operations on probability vectors which are of particular interest to us: the tensor 
product and the direct sum. For x = [x-y, . . . ,Xd) G Pd and x' — (x' l5 . . . ,x' d ,) G we define the 
tensor product x (g> x' as the vector (xix'j)ij G Pdd'- We also define the direct sum x © x' as the 
concatenated vector (xy, . . . ,Xd,x[, . . . ,x' d ,) G R d+d . Note that if we take ©-convex combinations, we 
get probability vectors: Ax (1 — X)x' G Pd+d' ■ 

The construction which permits us to use tools from stochastic domination in the framework of 
majorization is the following (inspired by [11]): to a probability vector z G P<oo we associate a 
probability measure jj, z defined by: 

These measures behave well with respect to tensor products: 

flx(&y — f^x * fly- 

The connection between majorization and stochastic domination is provided by the following lemma: 
Lemma 4.3. Let x,y G P<oo- Assume that \i x ^ st fi y . Then x -< y. 
Proof. We can assume that x — and y = y- 1 . Note that 

i:\ogXi^t i:a;i^cxp(t) 

Thus, for all u > 0, J2i- Xi >u Xi ^ Ei-j/^u To start, use u = yi to conclude that x\ ^ y\. Notice 

that it suffices to show that Yli=i x i ^ J2i=i Vi onr y f° r those k such that Xk > yk (indeed, if Xk < yk, 
the (k + l)-th inequality in ([9]) can be deduced from the k-th inequality). Consider such a k and let 
x k > u > y k . We get: 

k k 
i—1 i-.Xi^zu i-.y^u i—1 

which completes the proof of the lemma. □ 

Remark 4.4. The converse of this lemma does not hold. Indeed, consider x = (0.5,0.5) and y = 
(0.9,0.1). Obviously, x -< y but 1 = /x x [log0.5, oo) > //^ [log 0.5, oo) = 0.9 and thus fj, x ^ st \i y . 

We can describe the majorization relation by the sets: 

Sd(y) = {x G P d s.t. x -< y}, 

where y is a finitely supported probability vector. Mathematically, such a set is characterized by the 
following lemma, which is a simple consequence of Birkhoff's theorem on bistochastic matrices: 

Lemma 4.5. For y a d-dimensional probability vector, the set S(y) is a polytope whose extreme points 
are y and its permutations. 

The initial motivation for our work was the following phenomena discovered in quantum information 
theory (see [10] and respectively [2]). It turns out that additional vectors can act as catalysts for the 
majorization relation: there are vectors x,y,z G P<oo such that x -fi y but x ® z ~< y ® z; in such a 
situation we say that x is catalytically majorized (or trumped) by y and we write x -<t V- Another 
form of catalysis is provided by multiple copies of vectors: we can find vectors x and y such that 
i / j but still, for some n ^ 2, x®" -< y® n \ in this case we write x -<m V- We have thus two new 
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order relations on probability vectors, analogues of <^ and respectively ^* t . As before, for y £ Pd, we 
introduce the sets 

T d (y) = {x £ Pd s.t. x < T y}, 

and 

M d (y) = {x S Pd s.t. x -< M y}- 
It turns out that the relations -<t and -<m (and thus the sets Td{y) and M d {y)) are not as simple as -< 
and S d {y). It is known that the inclusion Md{y) C T d (y) holds (this is the analogue of Lemma [3l9]) and 
that it can be strict [8]. In general, the sets T d (y) and M d (y) are neither closed nor open, and although 
T d (y) is known to be convex, nothing is known about the convexity of M d {y) (such questions have 
been intensively studied in the physical literature; see [11 [6] and the references therein). As explained 
in p] it is natural from a mathematical point of view to introduce the sets T <OQ (y) = IJdeN Td(y) and 
M <00 [y) — UdeN Md(y). A key notion in characterizing them is Schur- convexity: 

Definition 4.6. A function f : Pd —>■ R is said to be 

• Schur-convex if f{x) ^ f(y) whenever x -< y, 

• Schur-concave if f{x) ^ f(y) whenever x <y, 

• strictly Schur-convex if f(x) < f(y) whenever x ijj y, 

• strictly Schur-concave if f(x) > f(y) whenever x ^ y, 
where x y means x <y and x^ ^ y- 1 . 

Examples are provided as follows: if $ : R — ► R is a (strictly) convex/ concave function, then the 
following function h : Pd — > R defined by h(xi, . . . ,Xd) — + • • • + $(xd) is (strictly) Schur- 

convex / Schur-concave. 

For x £ Pd and p £ R, we define N p (x) as 

N p (x) = x l 

Xi>0 

We will also use the Shannon entropy H 

d 

H{x) = -y^jlogXj. 
1=1 

Note that —H(x) is the derivative of p i— > N p (x) at p = 1 and that Nq(x) is the number of non-zero 
components of the vector x. These functions satisfy the following properties: 

(1) If p > 1, N p is strictly Schur-convex on P <00 . 

(2) If < p < 1, N p is strictly Schur -concave on -P<oc- 

(3) If p < 0, JV P is strictly Schur-convex on Pd for any c?. However, for p < 0, it is not possible to 
compare vectors with a different number of non-zero components. 

(4) H is strictly Schur-concave on P <00 . 

One possible way of describing the relations -<m and <t is to find a family (the smallest possible) of 
Schur-convex functions which characterizes them. In this direction, Nielsen conjectured the following 
result: 

Conjecture 4.7. Fix a vector y £ P d , with nonzero coordinates. Then Td(y) = Md{y) and they both 
are equal to the set of x £ P d satisfying 

(CI) Forp^l, N p (x)^N p (y). 

(C2) For < p < 1, N p {x) > N p (y). 

(C3) Forp<0, N p (x)^N p (y). 
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Here, the closures are taken in R d (recall that neither Md(y) nor Td(y) is closed). By the previous 
remarks, any vector in Td(y) or Md{y) (and by continuity, also in the closures) must satisfy conditions 
(C1-C3). Recently, Turgut |18[I19| provided a complete characterization of the set Td(y), which implies 
in particular that Nielsen's conjecture is true for Td{y). His method, completely different from ours, 
consists in solving a discrete approximation of the problem using elementary algebraic techniques. 
Note however that the inclusion Md(y) C Td(y) is strict in general, and thus the characterization of 
Md(y) is still open. We shall now focus on the set Md(y). Conjecture 14.71 can be reformulated as 
follows: if x,y £ Pd and satisfy (C1-C3), then there exists a sequence (x n ) in Md(y) such that (x n ) 
converges to x. If we relax the condition that x n and y have the same dimension, we can prove the 
following two theorems: 

Theorem 4.8. If x,y £ Pd and satisfy (CI), then there exists a sequence (x n ) in M <OQ (y) such that 
(x n ) converges to x in i\-norm. 

Theorem 4.9. If x,y £ Pd and satisfy (C1-C2), then there exists a sequence (x n ) in Md+i(y) such 
that (x n ) converges to x. 

Since Md(y) C Td(y), both theorems have direct analogues for T <OQ (y) and respectively Td+i{y). 
Theorem 14.81 restates the authors' previous result in [Tj; however, the proof presented in the next 
section is more transparent than the previous one. Theorem 14.91 answers a question of pp. It is an 
intermediate result between Theorem 14.81 and Conjecture 14.71 



5. Proof of the theorems 



Vmax - 



We show here how to derive Theorems 14.81 and 14.91 We first state a proposition which is the 
translation of Proposition 12.51 in terms of majorization. 

Proposition 5.1. Let x,y £ P<oo- Assume that x and y have nonzero coordinates, and respective 
dimensions d x and d y . Assume that 

(1) Xmin < y n 

(2) X max < y n 

(3) H{x) > H(y). 

(4) N p (x) < N p (y) for all p e]l, +oo[. 

(5) N p (x) > N p (y) for all p e] - oo, 1[. 

Then there exists an integer N such that for all n ^ N , we have x® n -< y® n . 

It is important to notice that since Nq(x) — d x and No(y) = d y , the conditions of the proposition 
can be satisfied only when d x > d y . This is the main reason why our approach fails to prove Conjecture 
14771 

Proof. One checks that the probability measures fi x and p, y associated to the vectors x and y satisfy 
the hypotheses of Proposition 12.51 Indeed, for p £ R, one has 



N p (x) — J e\djj, x , with A = p — 1. 



As = H x ®n, there exists a integer N such that for n ^ N, we have ^ st It remains to 

apply the Lemma 14.31 in order to complete the proof. □ 



The main idea used in the following proofs is to slightly modify the vector x so that the couple (x, 
y) satisfies the hypotheses of Proposition [5711 
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Proof of Theorem \J7E[ Let x,y £ P d satisfying N p (x) < N p (y) for all p ^ 1. Since Ni(x) = Ni(y) = 1 
and —H = -gjf |p=ij we also have —H(x) ^ —H(y). For < e < ^qrj-^min, define x e € P<i+i by 

__/■__ e _ £ N 

3> e l^Xi , . . . j 3^ , 6 ). 

a a 

One checks that x £ ^ x and therefore N p (x £ ) < N p (x) ^ N p (y) for any p > 1, and —H(x £ ) < —H{x) ^ 
—H(y). Since — i? = and the function p i— » iV p (-) is continuous, this means that there exists 

some < p e < 1 such that N p (x £ ) ^ N p {y) for any pG [p e , 1]. Choose an integer k > 2, depending on 
£, such that 

ji/(i-p e ).- Pc /(i-p c ) e 



and define x £ ^ G P< c 



For any ^ p ^ p £ we have 



and for any p < we have 




fc > maxK^-^e-^ 1 -^, ,d} 



%e,k = (Xi ,X d 

a 



N p (x e ,k) ^ k > N p (y), 



N p (x £ , k ) > k (|) P > dy p min > N p (y). 



We also have x £ ^ ^ x £ and therefore N p (x £ ^) > N p (x £ ) ^ N p (y) for p £ ^ p < 1. Similarly, 
N p {x £y k) < N p (x £ ) ^ N p (y) forp > 1. This means that x £i t and y satisfy the hypotheses of Proposition 
15.1} and therefore x £ ^ G M <oc (y). Since \\x £ ^ — x\\\ ^ 2e and e can be chosen arbitrarily small, this 
completes the proof of the theorem. □ 

Proof of Theorem \4-S\ Let x,y £ satisfying N p (x) ^ N p (y) for p ^ 1 and N p (x) ^ N p (y) for 
^ p ^ 1. As in the previous proof, we consider for < e < ^pj-x m in the vector x e defined as 

- i _ £ _ £ i 

x e — [xi , . . . , Xd ^ , £j. 

We are going to show using Proposition 15,11 that for e small enough, x £ is in Md+i(y). Note that 
x £ J> x, and therefore N p (x £ ) < N p (x) < N p (y) for p > 1, and N p (x £ ) > N p (x) ^ N p (y) for < p < 1. 
Also, since N (x £ ) = d + 1 and N (y) — d, there exists by continuity a number p < (not depending 
on e) such that N p (y) < d + 1 for all p £ [po, 0]. Thus for p G [p , 0] we have 

N p (x £ ) ^ N (x £ ) = d+l> N p (y). 

It remains to notice that for e < d 1 ^ Po j/ m in, we have for any p ^ po 

N p (x £ ) >eP> dy p min > N p (y), 

We checked that x £ and y satisfy the hypotheses of Proposition \5A\ and therefore x £ G Md+\{y). Since 
\\ x e — 2/ 1 1 x ^ 2e and e can be chosen arbitrarily small, this completes the proof of the theorem. □ 
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6. Infinite dimensional catalysis 

In light of the recent paper [15], we investigate the majorization relation and its generalizations for 
infinitely-supported probability vectors. Let us start by adapting the key tools used in the previous 
section to this non-finite setting. 

First, note that when defining the decreasing rearrangement x^ of a vector x, we shall ask that 
only the non-zero components of x and x* should be the same up to permutation. The majorization 
relation -< extends trivially to Poo, the set of (possibly infinite) probability vectors. The same holds 
for the relations -<m and -<t (note however that for -<t, we allow now infinite-dimensional catalysts). 

Note that for a general probability vector, there is no reason that N p for p € (0, 1) or H should be 
finite. He have thus to replace the hypothesis (CI) by the following one: 
(CI') For p > 1, N p (x) ^ N p (y) and H(x) < oo. 

Notice however that the inequalities N p (x) ^ N p (y) for p — » 1 + imply that H(y) ^ H(x) < oo and 
thus both entropies are finite. 

Theorem 6.1. If x,y S Poo and satisfy (CI'), then, for all e > there exist finitely supported vectors 
x £ , y e S P<oo and n G N such that \\x — x e \\i < e, \\y — y e ||i ^5 e and xf n ~< yf n - 

Proof. Fix e > small enough. If y has infinite support, consider the truncated vector y £ = (yi + 
R(e), j/2, ■ ■ • , Vn(e)), where N(s) and R(s) are such that R(e) = X^at( £ )+i Vi ^ e ! otherwise put y e = y. 
Clearly, we have \\y — y e \\i ^ 2e and N p {y e ) ^ N p (y) for allp > 1. If the vector x is finite, use Theorem 
14.81 with x e — x and y e to conclude. Otherwise, consider M(e) such that S(e) = J2^M( E )+i x i ^ £ 
and define the vector 

S(e) S(e) S(e) 

X e = (Xl,X 2 , ■ ■ . ,X M (e), — 7— , —7—, ■ • ■ , —j—), 



k times 

where A: is a constant depending on e which will be chosen later. For all k ^ 1, x e is a finite vector 
of size M(e) + k and we have ||x — a: e ||i ^5 2e. Let us now show that we can chose k such that 
N p (x e ) ^ N p (x) for all p ^ 1. In order to do this, consider the function <f) : (1, oo) — » R + 



S(s) 



Eoo p 
i=M( £ ) + l x i 



The function 4> takes finite values on (1, oo) and lim p ^oo 4>{p) — X mI^ +1 < 00 • Moreover, as the 
Shannon entropy of x is finite, one can also show that lim p ^ 1 + 4>{p) < oo. Thus, the function 4> is 
bounded and we can choose k S N such that k ^ 4>(p) for all p > 1. This implies that 

Np{x e )-N p (x)=k(^f) P - J2 x^O. 

^ " ' i=M{s) + l 

In conclusion, we have found two finitely supported vectors x e and y E such that ||x — x e ||i < 2e, 
\\y — 2/e 1 1 1 ^ 2e and N p (x e ) ^ N p {y e ) for all p > 1. To conclude, it suffices to apply Theorem 14.81 to x e 
and y e . □ 
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